Goto

Collaborating Authors

 sq lower bound


Supplementary Material Proofs from Section 2

Neural Information Processing Systems

The proof of Claim 2.3 is obtained via the following calculation, using the definition of Hermite tensor (Definition 2.2). We will use i,j for indexes in [d]. The above is equivalent to Hk(Bx) = B kHk(x). We construct the truncated distribution A as follows. We first sample x A, then we reject x unless x 2 B. Let A be the distribution of the samples we get from this process. Using Markov's inequality and union bound, we have Then it only remains to verify Ex A [Hk(x)] Ex Nm[Hk(x)] 2 for any k < d.





HardnessofNoise-FreeLearningfor Two-Hidden-LayerNeuralNetworks

Neural Information Processing Systems

We give superpolynomial statistical query (SQ) lower bounds for learning twohidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free)model.



SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions

Neural Information Processing Systems

We study the complexity of Non-Gaussian Component Analysis (NGCA) in the Statistical Query (SQ) model. Prior work developed a general methodology to prove SQ lower bounds for this task that have been applicable to a wide range of contexts. In particular, it was known that for any univariate distribution A satisfying certain conditions, distinguishing between a standard multivariate Gaussian and a distribution that behaves like A in a random hidden direction and like a standard Gaussian in the orthogonal complement, is SQ-hard. The required conditions were that (1) A matches many low-order moments with the standard univariate Gaussian, and (2) the chi-squared norm of A with respect to the standard Gaussian is finite. While the moment-matching condition is necessary for hardness, the chi-squared condition was only required for technical reasons. In this work, we establish that the latter condition is indeed not necessary. In particular, we prove near-optimal SQ lower bounds for NGCA under the moment-matching condition only. Our result naturally generalizes to the setting of a hidden subspace. Leveraging our general SQ lower bound, we obtain near-optimal SQ lower bounds for a range of concrete estimation tasks where existing techniques provide sub-optimal or even vacuous guarantees.



SQ Lower Bounds for Learning Mixtures of Linear Classifiers

Neural Information Processing Systems

Our main result is a Statistical Query (SQ) lower bound suggesting that known algorithms for this problem are essentially best possible,even for the special case of uniform mixtures.In particular, we show that the complexity of any SQ algorithm for the problem is $n^{\mathrm{poly}(1/\Delta) \log(r)}$,where $\Delta$ is a lower bound on the pairwise $\ell_2$-separation between the $\mathbf{v}_{\ell}$'s.The key technical ingredient underlying our result is a new construction of spherical designs on the unit sphere that may be of independent interest.


SQ Lower Bounds for Learning Single Neurons with Massart Noise

Neural Information Processing Systems

We study the problem of PAC learning a single neuron in the presence of Massart noise. Specifically, for a known activation function $f: \mathbb{R}\to \mathbb{R}$, the learner is given access to labeled examples $(\mathbf{x}, y) \in \mathbb{R}^d \times \mathbb{R}$, where the marginal distribution of $\mathbf{x}$ is arbitrary and the corresponding label $y$ is a Massart corruption of $f(\langle \mathbf{w}, \mathbf{x} \rangle)$. The goal of the learner is to output a hypothesis $h: \mathbb{R}^d \to \mathbb{R}$ with small squared loss. For a range of activation functions, including ReLUs, we establish super-polynomial Statistical Query (SQ) lower bounds for this learning problem. In more detail, we prove that no efficient SQ algorithm can approximate the optimal error within any constant factor. Our main technical contribution is a novel SQ-hard construction for learning $\{ \pm 1\}$-weight Massart halfspaces on the Boolean hypercube that is interesting on its own right.